Overview

Dataset statistics

Number of variables17
Number of observations5228430
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory438.8 MiB
Average record size in memory88.0 B

Variable types

Categorical6
DateTime1
Numeric10

Alerts

jockey has a high cardinality: 178 distinct values High cardinality
race_number is highly correlated with post_timeHigh correlation
latitude is highly correlated with longitudeHigh correlation
longitude is highly correlated with latitudeHigh correlation
post_time is highly correlated with race_numberHigh correlation
distance_id is highly correlated with weight_carriedHigh correlation
weight_carried is highly correlated with distance_idHigh correlation
track_id is highly correlated with latitude and 1 other fieldsHigh correlation
race_number is highly correlated with post_timeHigh correlation
trakus_index is highly correlated with distance_id and 1 other fieldsHigh correlation
latitude is highly correlated with track_id and 1 other fieldsHigh correlation
longitude is highly correlated with track_id and 1 other fieldsHigh correlation
distance_id is highly correlated with trakus_index and 3 other fieldsHigh correlation
course_type is highly correlated with trakus_index and 4 other fieldsHigh correlation
track_condition is highly correlated with course_typeHigh correlation
run_up_distance is highly correlated with distance_id and 1 other fieldsHigh correlation
post_time is highly correlated with race_numberHigh correlation
weight_carried is highly correlated with distance_id and 1 other fieldsHigh correlation
run_up_distance has 59344 (1.1%) zeros Zeros

Reproduction

Analysis started2022-08-18 10:10:13.040521
Analysis finished2022-08-18 10:14:37.992291
Duration4 minutes and 24.95 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

track_id
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size39.9 MiB
AQU
2158369 
BEL
1947134 
SAR
1122927 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters15685290
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAQU
2nd rowAQU
3rd rowAQU
4th rowAQU
5th rowAQU

Common Values

ValueCountFrequency (%)
AQU2158369
41.3%
BEL1947134
37.2%
SAR1122927
21.5%

Length

2022-08-18T15:44:38.065492image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-18T15:44:38.167784image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
aqu2158369
41.3%
bel1947134
37.2%
sar1122927
21.5%

Most occurring characters

ValueCountFrequency (%)
A3281296
20.9%
Q2158369
13.8%
U2158369
13.8%
B1947134
12.4%
E1947134
12.4%
L1947134
12.4%
S1122927
 
7.2%
R1122927
 
7.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter15685290
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A3281296
20.9%
Q2158369
13.8%
U2158369
13.8%
B1947134
12.4%
E1947134
12.4%
L1947134
12.4%
S1122927
 
7.2%
R1122927
 
7.2%

Most occurring scripts

ValueCountFrequency (%)
Latin15685290
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A3281296
20.9%
Q2158369
13.8%
U2158369
13.8%
B1947134
12.4%
E1947134
12.4%
L1947134
12.4%
S1122927
 
7.2%
R1122927
 
7.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII15685290
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A3281296
20.9%
Q2158369
13.8%
U2158369
13.8%
B1947134
12.4%
E1947134
12.4%
L1947134
12.4%
S1122927
 
7.2%
R1122927
 
7.2%
Distinct217
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size39.9 MiB
Minimum2019-01-01 00:00:00
Maximum2019-12-31 00:00:00
2022-08-18T15:44:38.253194image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:44:38.342360image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

race_number
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct13
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.505408889
Minimum1
Maximum13
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.0 MiB
2022-08-18T15:44:38.426438image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median6
Q38
95-th percentile10
Maximum13
Range12
Interquartile range (IQR)5

Descriptive statistics

Standard deviation2.860655522
Coefficient of variation (CV)0.5196081852
Kurtosis-0.980067865
Mean5.505408889
Median Absolute Deviation (MAD)2
Skewness0.07692640069
Sum28784645
Variance8.183350015
MonotonicityNot monotonic
2022-08-18T15:44:38.492612image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
8582413
11.1%
6575217
11.0%
5559884
10.7%
7554623
10.6%
4523846
10.0%
2515115
9.9%
1508808
9.7%
9492915
9.4%
3490256
9.4%
10283141
5.4%
Other values (3)142212
 
2.7%
ValueCountFrequency (%)
1508808
9.7%
2515115
9.9%
3490256
9.4%
4523846
10.0%
5559884
10.7%
6575217
11.0%
7554623
10.6%
8582413
11.1%
9492915
9.4%
10283141
5.4%
ValueCountFrequency (%)
1311683
 
0.2%
1231725
 
0.6%
1198804
 
1.9%
10283141
5.4%
9492915
9.4%
8582413
11.1%
7554623
10.6%
6575217
11.0%
5559884
10.7%
4523846
10.0%

program_number
Categorical

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size39.9 MiB
4
611396 
3
611025 
5
606666 
2
602597 
1
599979 
Other values (15)
2196767 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters15685290
Distinct characters14
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row6
2nd row6
3rd row6
4th row6
5th row6

Common Values

ValueCountFrequency (%)
4 611396
11.7%
3 611025
11.7%
5 606666
11.6%
2 602597
11.5%
1 599979
11.5%
6 579893
11.1%
7 486905
9.3%
8 373202
7.1%
9 266526
5.1%
10 186835
 
3.6%
Other values (10)303406
5.8%

Length

2022-08-18T15:44:38.563735image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
4611396
11.7%
3611025
11.7%
5606666
11.6%
2602597
11.5%
1599979
11.5%
6579893
11.1%
7486905
9.3%
8373202
7.1%
9266526
5.1%
10186835
 
3.6%
Other values (10)303406
5.8%

Most occurring characters

ValueCountFrequency (%)
9966619
63.5%
11203337
 
7.7%
2678041
 
4.3%
3637866
 
4.1%
4626477
 
4.0%
5614583
 
3.9%
6583484
 
3.7%
7486905
 
3.1%
8373202
 
2.4%
9266526
 
1.7%
Other values (4)248250
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Space Separator9966619
63.5%
Decimal Number5657256
36.1%
Uppercase Letter61415
 
0.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
11203337
21.3%
2678041
12.0%
3637866
11.3%
4626477
11.1%
5614583
10.9%
6583484
10.3%
7486905
8.6%
8373202
 
6.6%
9266526
 
4.7%
0186835
 
3.3%
Uppercase Letter
ValueCountFrequency (%)
A56753
92.4%
B4081
 
6.6%
X581
 
0.9%
Space Separator
ValueCountFrequency (%)
9966619
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common15623875
99.6%
Latin61415
 
0.4%

Most frequent character per script

Common
ValueCountFrequency (%)
9966619
63.8%
11203337
 
7.7%
2678041
 
4.3%
3637866
 
4.1%
4626477
 
4.0%
5614583
 
3.9%
6583484
 
3.7%
7486905
 
3.1%
8373202
 
2.4%
9266526
 
1.7%
Latin
ValueCountFrequency (%)
A56753
92.4%
B4081
 
6.6%
X581
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII15685290
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9966619
63.5%
11203337
 
7.7%
2678041
 
4.3%
3637866
 
4.1%
4626477
 
4.0%
5614583
 
3.9%
6583484
 
3.7%
7486905
 
3.1%
8373202
 
2.4%
9266526
 
1.7%
Other values (4)248250
 
1.6%

trakus_index
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1062
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean183.8652125
Minimum1
Maximum1062
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size10.0 MiB
2022-08-18T15:44:38.634120image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile18
Q188
median176
Q3264
95-th percentile377
Maximum1062
Range1061
Interquartile range (IQR)176

Descriptive statistics

Standard deviation118.3326899
Coefficient of variation (CV)0.6435838964
Kurtosis2.233086316
Mean183.8652125
Median Absolute Deviation (MAD)88
Skewness0.8361103605
Sum961326393
Variance14002.62549
MonotonicityNot monotonic
2022-08-18T15:44:38.722577image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
114915
 
0.3%
214915
 
0.3%
314915
 
0.3%
414915
 
0.3%
814914
 
0.3%
1514914
 
0.3%
1614914
 
0.3%
914914
 
0.3%
1014914
 
0.3%
1114914
 
0.3%
Other values (1052)5079286
97.1%
ValueCountFrequency (%)
114915
0.3%
214915
0.3%
314915
0.3%
414915
0.3%
514914
0.3%
614914
0.3%
714914
0.3%
814914
0.3%
914914
0.3%
1014914
0.3%
ValueCountFrequency (%)
10629
< 0.1%
10619
< 0.1%
10609
< 0.1%
10599
< 0.1%
10589
< 0.1%
10579
< 0.1%
10569
< 0.1%
10559
< 0.1%
10549
< 0.1%
10539
< 0.1%

latitude
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct5224784
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean41.20386704
Minimum40.6667108
Maximum43.07399173
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size39.9 MiB
2022-08-18T15:44:38.826664image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum40.6667108
5-th percentile40.67021298
Q140.67345603
median40.71393361
Q340.71739324
95-th percentile43.07299548
Maximum43.07399173
Range2.407280932
Interquartile range (IQR)0.04393720806

Descriptive statistics

Standard deviation0.9771226713
Coefficient of variation (CV)0.02371434386
Kurtosis-0.07199889901
Mean41.20386704
Median Absolute Deviation (MAD)0.03990861543
Skewness1.387718403
Sum215431534.5
Variance0.9547687148
MonotonicityNot monotonic
2022-08-18T15:44:38.921922image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
43.0717923498
 
< 0.1%
40.6667212746
 
< 0.1%
43.0686570631
 
< 0.1%
43.0686970231
 
< 0.1%
43.06870531
 
< 0.1%
43.0686730431
 
< 0.1%
43.0686810431
 
< 0.1%
43.0687129931
 
< 0.1%
43.0686650531
 
< 0.1%
43.0686490731
 
< 0.1%
Other values (5224774)5228038
> 99.9%
ValueCountFrequency (%)
40.66671082
 
< 0.1%
40.666718743
 
< 0.1%
40.666720664
 
< 0.1%
40.6667212746
< 0.1%
40.666722243
 
< 0.1%
40.666722623
 
< 0.1%
40.666726333
 
< 0.1%
40.666727913
 
< 0.1%
40.666730984
 
< 0.1%
40.666732049
 
< 0.1%
ValueCountFrequency (%)
43.073991731
< 0.1%
43.07399171
< 0.1%
43.073990971
< 0.1%
43.073989481
< 0.1%
43.073989431
< 0.1%
43.073985631
< 0.1%
43.073985171
< 0.1%
43.073980911
< 0.1%
43.073980081
< 0.1%
43.073979771
< 0.1%

longitude
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION

Distinct5224799
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-73.77702389
Minimum-73.83260149
Maximum-73.7148265
Zeros0
Zeros (%)0.0%
Negative5228430
Negative (%)100.0%
Memory size39.9 MiB
2022-08-18T15:44:39.025797image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-73.83260149
5-th percentile-73.83173468
Q1-73.82883411
median-73.76914831
Q3-73.72563624
95-th percentile-73.72015512
Maximum-73.7148265
Range0.1177749864
Interquartile range (IQR)0.1031978737

Descriptive statistics

Standard deviation0.04711054248
Coefficient of variation (CV)-0.0006385530344
Kurtosis-1.735802319
Mean-73.77702389
Median Absolute Deviation (MAD)0.04751605643
Skewness-0.03086390547
Sum-385738005
Variance0.002219403212
MonotonicityNot monotonic
2022-08-18T15:44:39.146243image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-73.7646576798
 
< 0.1%
-73.8302152546
 
< 0.1%
-73.7719576431
 
< 0.1%
-73.7719689731
 
< 0.1%
-73.7719463331
 
< 0.1%
-73.7719293431
 
< 0.1%
-73.7719406631
 
< 0.1%
-73.7719180331
 
< 0.1%
-73.7719633131
 
< 0.1%
-73.7719236931
 
< 0.1%
Other values (5224789)5228038
> 99.9%
ValueCountFrequency (%)
-73.832601491
< 0.1%
-73.832601461
< 0.1%
-73.832600621
< 0.1%
-73.83260051
< 0.1%
-73.832599021
< 0.1%
-73.832598171
< 0.1%
-73.832596121
< 0.1%
-73.832594941
< 0.1%
-73.832592371
< 0.1%
-73.832590651
< 0.1%
ValueCountFrequency (%)
-73.71482651
 
< 0.1%
-73.714897348
< 0.1%
-73.71493371
 
< 0.1%
-73.714935251
 
< 0.1%
-73.714939111
 
< 0.1%
-73.714958221
 
< 0.1%
-73.714963187
< 0.1%
-73.714967091
 
< 0.1%
-73.714974191
 
< 0.1%
-73.714978111
 
< 0.1%

distance_id
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean756.3151271
Minimum450
Maximum2000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size10.0 MiB
2022-08-18T15:44:39.258154image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum450
5-th percentile550
Q1600
median800
Q3850
95-th percentile1000
Maximum2000
Range1550
Interquartile range (IQR)250

Descriptive statistics

Standard deviation179.6886889
Coefficient of variation (CV)0.2375844175
Kurtosis13.04437961
Mean756.3151271
Median Absolute Deviation (MAD)100
Skewness2.659565266
Sum3954340700
Variance32288.02492
MonotonicityNot monotonic
2022-08-18T15:44:39.400887image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
6001194752
22.9%
8001137856
21.8%
850776832
14.9%
700595436
11.4%
650466190
 
8.9%
900416372
 
8.0%
550280966
 
5.4%
100087941
 
1.7%
110083223
 
1.6%
120045213
 
0.9%
Other values (10)143649
 
2.7%
ValueCountFrequency (%)
4502796
 
0.1%
50025270
 
0.5%
550280966
 
5.4%
6001194752
22.9%
650466190
 
8.9%
700595436
11.4%
8001137856
21.8%
850776832
14.9%
900416372
 
8.0%
95035814
 
0.7%
ValueCountFrequency (%)
20009558
 
0.2%
190015315
 
0.3%
18006713
 
0.1%
165033963
 
0.6%
16005904
 
0.1%
14004655
 
0.1%
120045213
0.9%
110083223
1.6%
10503661
 
0.1%
100087941
1.7%

course_type
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size39.9 MiB
D
3229234 
T
988274 
I
752310 
O
 
193063
M
 
65549

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters5228430
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowD
2nd rowD
3rd rowD
4th rowD
5th rowD

Common Values

ValueCountFrequency (%)
D3229234
61.8%
T988274
 
18.9%
I752310
 
14.4%
O193063
 
3.7%
M65549
 
1.3%

Length

2022-08-18T15:44:39.469211image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-18T15:44:39.739253image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
d3229234
61.8%
t988274
 
18.9%
i752310
 
14.4%
o193063
 
3.7%
m65549
 
1.3%

Most occurring characters

ValueCountFrequency (%)
D3229234
61.8%
T988274
 
18.9%
I752310
 
14.4%
O193063
 
3.7%
M65549
 
1.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter5228430
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
D3229234
61.8%
T988274
 
18.9%
I752310
 
14.4%
O193063
 
3.7%
M65549
 
1.3%

Most occurring scripts

ValueCountFrequency (%)
Latin5228430
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
D3229234
61.8%
T988274
 
18.9%
I752310
 
14.4%
O193063
 
3.7%
M65549
 
1.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII5228430
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
D3229234
61.8%
T988274
 
18.9%
I752310
 
14.4%
O193063
 
3.7%
M65549
 
1.3%

track_condition
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size39.9 MiB
FT
2345748 
FM
1391865 
GD
733124 
SY
486795 
MY
 
172943
Other values (2)
 
97955

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters15685290
Distinct characters9
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGD
2nd rowGD
3rd rowGD
4th rowGD
5th rowGD

Common Values

ValueCountFrequency (%)
FT 2345748
44.9%
FM 1391865
26.6%
GD 733124
 
14.0%
SY 486795
 
9.3%
MY 172943
 
3.3%
YL 89250
 
1.7%
SF 8705
 
0.2%

Length

2022-08-18T15:44:39.804900image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-18T15:44:39.898070image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
ft2345748
44.9%
fm1391865
26.6%
gd733124
 
14.0%
sy486795
 
9.3%
my172943
 
3.3%
yl89250
 
1.7%
sf8705
 
0.2%

Most occurring characters

ValueCountFrequency (%)
5228430
33.3%
F3746318
23.9%
T2345748
15.0%
M1564808
 
10.0%
Y748988
 
4.8%
G733124
 
4.7%
D733124
 
4.7%
S495500
 
3.2%
L89250
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter10456860
66.7%
Space Separator5228430
33.3%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
F3746318
35.8%
T2345748
22.4%
M1564808
15.0%
Y748988
 
7.2%
G733124
 
7.0%
D733124
 
7.0%
S495500
 
4.7%
L89250
 
0.9%
Space Separator
ValueCountFrequency (%)
5228430
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin10456860
66.7%
Common5228430
33.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
F3746318
35.8%
T2345748
22.4%
M1564808
15.0%
Y748988
 
7.2%
G733124
 
7.0%
D733124
 
7.0%
S495500
 
4.7%
L89250
 
0.9%
Common
ValueCountFrequency (%)
5228430
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII15685290
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5228430
33.3%
F3746318
23.9%
T2345748
15.0%
M1564808
 
10.0%
Y748988
 
4.8%
G733124
 
4.7%
D733124
 
4.7%
S495500
 
3.2%
L89250
 
0.6%

run_up_distance
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct126
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean45.11571504
Minimum-128
Maximum126
Zeros59344
Zeros (%)1.1%
Negative454007
Negative (%)8.7%
Memory size5.0 MiB
2022-08-18T15:44:39.983723image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-128
5-th percentile-92
Q144
median52
Q372
95-th percentile101
Maximum126
Range254
Interquartile range (IQR)28

Descriptive statistics

Standard deviation48.31769345
Coefficient of variation (CV)1.070972574
Kurtosis4.389318633
Mean45.11571504
Median Absolute Deviation (MAD)12
Skewness-2.018172041
Sum235884358
Variance2334.5995
MonotonicityNot monotonic
2022-08-18T15:44:40.077931image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
54798776
 
15.3%
45396175
 
7.6%
48345963
 
6.6%
52235485
 
4.5%
44197209
 
3.8%
72186176
 
3.6%
32153818
 
2.9%
40141682
 
2.7%
90141586
 
2.7%
50106533
 
2.0%
Other values (116)2525027
48.3%
ValueCountFrequency (%)
-12813979
 
0.3%
-12620732
0.4%
-1224664
 
0.1%
-12138563
0.7%
-1205148
 
0.1%
-11830326
0.6%
-1169101
 
0.2%
-11424091
0.5%
-11220582
0.4%
-1087273
 
0.1%
ValueCountFrequency (%)
12659268
1.1%
1253104
 
0.1%
1243432
 
0.1%
1237248
 
0.1%
1222680
 
0.1%
1212944
 
0.1%
1202034
 
< 0.1%
1181962
 
< 0.1%
11714301
 
0.3%
11226659
0.5%

race_type
Categorical

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size39.9 MiB
CLM
1142801 
MSW
1089307 
MCL
920650 
STK
730102 
AOC
541117 
Other values (7)
804453 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters15685290
Distinct characters12
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCLM
2nd rowCLM
3rd rowCLM
4th rowCLM
5th rowCLM

Common Values

ValueCountFrequency (%)
CLM1142801
21.9%
MSW1089307
20.8%
MCL920650
17.6%
STK730102
14.0%
AOC541117
10.3%
ALW522748
10.0%
STR186744
 
3.6%
SOC36069
 
0.7%
SST27560
 
0.5%
WCL22386
 
0.4%
Other values (2)8946
 
0.2%

Length

2022-08-18T15:44:40.172278image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
clm1142801
21.9%
msw1089307
20.8%
mcl920650
17.6%
stk730102
14.0%
aoc541117
10.3%
alw522748
10.0%
str186744
 
3.6%
soc36069
 
0.7%
sst27560
 
0.5%
wcl22386
 
0.4%
Other values (2)8946
 
0.2%

Most occurring characters

ValueCountFrequency (%)
M3157018
20.1%
C2667283
17.0%
L2608585
16.6%
S2102028
13.4%
W1638701
10.4%
A1063865
 
6.8%
T944406
 
6.0%
K730102
 
4.7%
O577186
 
3.7%
R186744
 
1.2%
Other values (2)9372
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter15685290
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M3157018
20.1%
C2667283
17.0%
L2608585
16.6%
S2102028
13.4%
W1638701
10.4%
A1063865
 
6.8%
T944406
 
6.0%
K730102
 
4.7%
O577186
 
3.7%
R186744
 
1.2%
Other values (2)9372
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin15685290
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
M3157018
20.1%
C2667283
17.0%
L2608585
16.6%
S2102028
13.4%
W1638701
10.4%
A1063865
 
6.8%
T944406
 
6.0%
K730102
 
4.7%
O577186
 
3.7%
R186744
 
1.2%
Other values (2)9372
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII15685290
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M3157018
20.1%
C2667283
17.0%
L2608585
16.6%
S2102028
13.4%
W1638701
10.4%
A1063865
 
6.8%
T944406
 
6.0%
K730102
 
4.7%
O577186
 
3.7%
R186744
 
1.2%
Other values (2)9372
 
0.1%

purse
Real number (ℝ≥0)

Distinct70
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean88878.90778
Minimum16000
Maximum1500000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size19.9 MiB
2022-08-18T15:44:40.240206image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum16000
5-th percentile28000
Q141000
median62000
Q380000
95-th percentile200000
Maximum1500000
Range1484000
Interquartile range (IQR)39000

Descriptive statistics

Standard deviation127824.0255
Coefficient of variation (CV)1.438181777
Kurtosis42.94515382
Mean88878.90778
Median Absolute Deviation (MAD)20000
Skewness5.974790005
Sum4.646971478 × 1011
Variance1.63389815 × 1010
MonotonicityNot monotonic
2022-08-18T15:44:40.348608image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
41000353152
 
6.8%
62000281264
 
5.4%
70000279714
 
5.3%
75000250906
 
4.8%
90000242703
 
4.6%
60000234699
 
4.5%
28000228702
 
4.4%
80000208012
 
4.0%
100000195429
 
3.7%
55000181012
 
3.5%
Other values (60)2772837
53.0%
ValueCountFrequency (%)
160008291
 
0.2%
200008030
 
0.2%
220008628
 
0.2%
240004784
 
0.1%
2500018138
 
0.3%
28000228702
4.4%
3000066496
 
1.3%
310006463
 
0.1%
3200059805
 
1.1%
33000138581
2.7%
ValueCountFrequency (%)
15000005520
 
0.1%
12500005568
 
0.1%
12000003258
 
0.1%
100000022844
0.4%
8500005040
 
0.1%
75000025145
0.5%
70000019404
0.4%
60000011914
 
0.2%
50000032488
0.6%
40000034385
0.7%

post_time
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct360
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean420.4248074
Minimum100
Maximum1259
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size10.0 MiB
2022-08-18T15:44:40.443970image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum100
5-th percentile125
Q1233
median349
Q3516
95-th percentile1230
Maximum1259
Range1159
Interquartile range (IQR)283

Descriptive statistics

Standard deviation280.2544999
Coefficient of variation (CV)0.6665983903
Kurtosis2.912988755
Mean420.4248074
Median Absolute Deviation (MAD)126
Skewness1.754678604
Sum2198161676
Variance78542.58471
MonotonicityNot monotonic
2022-08-18T15:44:40.532277image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13094327
 
1.8%
125094087
 
1.8%
22361610
 
1.2%
55156106
 
1.1%
44553938
 
1.0%
41252381
 
1.0%
20250544
 
1.0%
51849133
 
0.9%
10046225
 
0.9%
30845920
 
0.9%
Other values (350)4624159
88.4%
ValueCountFrequency (%)
10046225
0.9%
1018926
 
0.2%
10237246
0.7%
10319977
0.4%
1046638
 
0.1%
1052198
 
< 0.1%
1081710
 
< 0.1%
1091698
 
< 0.1%
11523589
0.5%
1169350
 
0.2%
ValueCountFrequency (%)
125917295
 
0.3%
12587655
 
0.1%
12571848
 
< 0.1%
125611315
 
0.2%
125520626
 
0.4%
12544554
 
0.1%
12524956
 
0.1%
125117554
 
0.3%
125094087
1.8%
12498985
 
0.2%

weight_carried
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct35
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean120.7132914
Minimum110
Maximum160
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size10.0 MiB
2022-08-18T15:44:40.628828image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum110
5-th percentile115
Q1118
median120
Q3122
95-th percentile125
Maximum160
Range50
Interquartile range (IQR)4

Descriptive statistics

Standard deviation4.351880945
Coefficient of variation (CV)0.03605138172
Kurtosis25.79531531
Mean120.7132914
Median Absolute Deviation (MAD)2
Skewness3.737747874
Sum631140994
Variance18.93886776
MonotonicityNot monotonic
2022-08-18T15:44:40.711548image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=35)
ValueCountFrequency (%)
118805184
15.4%
120768432
14.7%
119747592
14.3%
122628713
12.0%
124510510
9.8%
121457490
8.8%
123384078
7.3%
125204079
 
3.9%
126133556
 
2.6%
117126486
 
2.4%
Other values (25)462310
8.8%
ValueCountFrequency (%)
110328
 
< 0.1%
11125915
 
0.5%
11218884
 
0.4%
11361619
 
1.2%
11482722
 
1.6%
11588757
 
1.7%
116115239
 
2.2%
117126486
 
2.4%
118805184
15.4%
119747592
14.3%
ValueCountFrequency (%)
1601062
 
< 0.1%
1581062
 
< 0.1%
1569624
0.2%
155643
 
< 0.1%
1541864
 
< 0.1%
1538026
0.2%
1526472
0.1%
1504539
0.1%
1491918
 
< 0.1%
1483593
 
0.1%

jockey
Categorical

HIGH CARDINALITY

Distinct178
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size39.9 MiB
Manuel Franco
444024 
Dylan Davis
381018 
Jose Lezcano
 
344802
Junior Alvarado
 
299324
Irad Ortiz Jr.
 
282044
Other values (173)
3477218 

Length

Max length26
Median length22
Mean length14.10091901
Min length8

Characters and Unicode

Total characters73725668
Distinct characters51
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAndre Shivnarine Worrie
2nd rowAndre Shivnarine Worrie
3rd rowAndre Shivnarine Worrie
4th rowAndre Shivnarine Worrie
5th rowAndre Shivnarine Worrie

Common Values

ValueCountFrequency (%)
Manuel Franco444024
 
8.5%
Dylan Davis381018
 
7.3%
Jose Lezcano344802
 
6.6%
Junior Alvarado299324
 
5.7%
Irad Ortiz Jr.282044
 
5.4%
Eric Cancel248857
 
4.8%
Jose L. Ortiz247944
 
4.7%
Kendrick Carmouche246683
 
4.7%
Joel Rosario232221
 
4.4%
Luis Saez231061
 
4.4%
Other values (168)2270452
43.4%

Length

2022-08-18T15:44:40.813718image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
jose598981
 
4.9%
jr546373
 
4.5%
ortiz529988
 
4.4%
manuel454841
 
3.7%
franco444024
 
3.7%
r424404
 
3.5%
luis400512
 
3.3%
davis383597
 
3.2%
dylan381018
 
3.1%
lezcano344802
 
2.8%
Other values (308)7627810
62.9%

Most occurring characters

ValueCountFrequency (%)
a7693318
 
10.4%
6907920
 
9.4%
e6426641
 
8.7%
r5788644
 
7.9%
n4584054
 
6.2%
o4449303
 
6.0%
i4065752
 
5.5%
l3143444
 
4.3%
J2442535
 
3.3%
z2305874
 
3.1%
Other values (41)25918183
35.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter53142084
72.1%
Uppercase Letter12166427
 
16.5%
Space Separator6907920
 
9.4%
Other Punctuation1508408
 
2.0%
Dash Punctuation829
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a7693318
14.5%
e6426641
12.1%
r5788644
10.9%
n4584054
8.6%
o4449303
8.4%
i4065752
 
7.7%
l3143444
 
5.9%
z2305874
 
4.3%
s2232138
 
4.2%
c2223535
 
4.2%
Other values (16)10229381
19.2%
Uppercase Letter
ValueCountFrequency (%)
J2442535
20.1%
R1364989
11.2%
L1123479
9.2%
C1005867
8.3%
D963403
 
7.9%
M874165
 
7.2%
H560294
 
4.6%
O543815
 
4.5%
S511289
 
4.2%
F475400
 
3.9%
Other values (12)2301191
18.9%
Space Separator
ValueCountFrequency (%)
6907920
100.0%
Other Punctuation
ValueCountFrequency (%)
.1508408
100.0%
Dash Punctuation
ValueCountFrequency (%)
-829
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin65308511
88.6%
Common8417157
 
11.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
a7693318
 
11.8%
e6426641
 
9.8%
r5788644
 
8.9%
n4584054
 
7.0%
o4449303
 
6.8%
i4065752
 
6.2%
l3143444
 
4.8%
J2442535
 
3.7%
z2305874
 
3.5%
s2232138
 
3.4%
Other values (38)22176808
34.0%
Common
ValueCountFrequency (%)
6907920
82.1%
.1508408
 
17.9%
-829
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII73725668
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a7693318
 
10.4%
6907920
 
9.4%
e6426641
 
8.7%
r5788644
 
7.9%
n4584054
 
6.2%
o4449303
 
6.0%
i4065752
 
5.5%
l3143444
 
4.3%
J2442535
 
3.3%
z2305874
 
3.1%
Other values (41)25918183
35.2%

odds
Real number (ℝ≥0)

Distinct656
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1482.919877
Minimum0
Maximum19100
Zeros978
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size10.0 MiB
2022-08-18T15:44:40.925869image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile115
Q1335
median730
Q31740
95-th percentile5675
Maximum19100
Range19100
Interquartile range (IQR)1405

Descriptive statistics

Standard deviation1952.415229
Coefficient of variation (CV)1.316601969
Kurtosis9.102398076
Mean1482.919877
Median Absolute Deviation (MAD)490
Skewness2.701129674
Sum7753342770
Variance3811925.225
MonotonicityNot monotonic
2022-08-18T15:44:41.029943image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
42045315
 
0.9%
43043158
 
0.8%
46042417
 
0.8%
41040648
 
0.8%
44038635
 
0.7%
40038563
 
0.7%
55037430
 
0.7%
52037132
 
0.7%
51036374
 
0.7%
49035402
 
0.7%
Other values (646)4833356
92.4%
ValueCountFrequency (%)
0978
 
< 0.1%
5293
 
< 0.1%
152497
 
< 0.1%
203292
 
0.1%
253805
0.1%
305142
0.1%
356939
0.1%
407563
0.1%
459170
0.2%
507556
0.1%
ValueCountFrequency (%)
19100409
 
< 0.1%
17425314
 
< 0.1%
16900409
 
< 0.1%
16800321
 
< 0.1%
152751074
< 0.1%
15000404
 
< 0.1%
14850275
 
< 0.1%
14550289
 
< 0.1%
14175304
 
< 0.1%
14125310
 
< 0.1%

Interactions

2022-08-18T15:44:08.764244image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:42:39.936068image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:42:50.466858image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:01.134242image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:10.895037image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:20.371907image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:30.096027image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:39.873435image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:49.170835image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:59.586532image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:44:09.648489image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:42:41.326174image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:42:51.397158image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:02.202208image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:11.837384image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:21.297581image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:31.076867image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:40.815524image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:50.000809image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:44:00.545377image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:44:10.628786image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:42:42.394337image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:42:52.402430image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:03.328488image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:12.797751image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:22.217315image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:32.061808image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:41.836587image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:50.914997image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:44:01.505059image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:44:11.544371image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:42:43.392671image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:42:53.381685image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:04.452100image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:13.740095image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:23.176696image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:32.986284image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:42.712950image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:51.957832image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:44:02.492727image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:44:12.391141image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:42:44.451294image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:42:54.417136image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:05.408183image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:14.822722image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:24.073153image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:33.842093image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:43.595793image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:53.140932image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:44:03.414460image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:44:13.436340image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:42:45.397311image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:42:55.630181image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:06.328816image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:15.753934image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:25.081178image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:34.741184image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:44.530307image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:54.208294image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:44:04.334576image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:44:14.401008image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:42:46.264592image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:42:56.586270image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:07.233261image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:16.707446image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:26.125484image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:35.887103image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:45.425663image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:55.435347image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:44:05.236037image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:44:15.334572image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:42:47.294173image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:42:57.796140image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:08.142352image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:17.646770image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:27.134588image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:36.902804image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:46.372479image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:56.503021image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:44:06.178202image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:44:16.308446image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:42:48.210425image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:42:58.779417image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:09.051805image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:18.564373image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:28.076288image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:38.077607image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:47.297216image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:57.541136image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:44:07.058472image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:44:17.296661image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:42:49.273360image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:00.090111image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:09.968572image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:19.483202image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:29.130003image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:38.961905image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:48.265547image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:43:58.628005image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-18T15:44:07.933326image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-08-18T15:44:41.108577image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-08-18T15:44:41.233200image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-08-18T15:44:41.376688image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-08-18T15:44:41.500276image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-08-18T15:44:41.603087image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-08-18T15:44:21.055913image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-08-18T15:44:27.432758image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

track_idrace_daterace_numberprogram_numbertrakus_indexlatitudelongitudedistance_idcourse_typetrack_conditionrun_up_distancerace_typepursepost_timeweight_carriedjockeyodds
0AQU2019-01-01967240.672902-73.827607600DGD48CLM25000420120Andre Shivnarine Worrie2090
1AQU2019-01-01967340.672946-73.827587600DGD48CLM25000420120Andre Shivnarine Worrie2090
2AQU2019-01-01967440.672990-73.827568600DGD48CLM25000420120Andre Shivnarine Worrie2090
3AQU2019-01-01966340.672510-73.827781600DGD48CLM25000420120Andre Shivnarine Worrie2090
4AQU2019-01-01966440.672553-73.827762600DGD48CLM25000420120Andre Shivnarine Worrie2090
5AQU2019-01-01966540.672596-73.827742600DGD48CLM25000420120Andre Shivnarine Worrie2090
6AQU2019-01-01966640.672640-73.827723600DGD48CLM25000420120Andre Shivnarine Worrie2090
7AQU2019-01-01966740.672683-73.827703600DGD48CLM25000420120Andre Shivnarine Worrie2090
8AQU2019-01-01966840.672726-73.827684600DGD48CLM25000420120Andre Shivnarine Worrie2090
9AQU2019-01-01965740.672243-73.827903600DGD48CLM25000420120Andre Shivnarine Worrie2090

Last rows

track_idrace_daterace_numberprogram_numbertrakus_indexlatitudelongitudedistance_idcourse_typetrack_conditionrun_up_distancerace_typepursepost_timeweight_carriedjockeyodds
5228420AQU2019-11-239217440.672080-73.8309891100TGD72STK200000353124Joel Rosario1120
5228421AQU2019-11-239217540.672038-73.8310081100TGD72STK200000353124Joel Rosario1120
5228422AQU2019-11-239217640.671996-73.8310271100TGD72STK200000353124Joel Rosario1120
5228423AQU2019-11-239217740.671955-73.8310451100TGD72STK200000353124Joel Rosario1120
5228424AQU2019-11-239216640.672403-73.8308341100TGD72STK200000353124Joel Rosario1120
5228425AQU2019-11-239216740.672363-73.8308531100TGD72STK200000353124Joel Rosario1120
5228426AQU2019-11-239216840.672321-73.8308731100TGD72STK200000353124Joel Rosario1120
5228427AQU2019-11-239216940.672281-73.8308931100TGD72STK200000353124Joel Rosario1120
5228428AQU2019-11-239217040.672240-73.8309131100TGD72STK200000353124Joel Rosario1120
5228429AQU2019-11-239217140.672200-73.8309321100TGD72STK200000353124Joel Rosario1120